External Plagiarism Detection using Information Retrieval and Sequence Alignment - Notebook for PAN at CLEF 2011
نویسندگان
چکیده
This paper describes the University of Sheffield entry for the 3rd International Competition on Plagiarism Detection which attempted the monolingual external plagiarism detection task. A three stage framework was used: preprocessing and indexing, candidate document selection (using an Information Retrieval based approach) and detailed analysis (using the Running Karp-Rabin Greedy String Tiling algorithm). The submitted system obtained an overall performance of 0.0804, precision of 0.2780, recall of 0.0885 and granularity of 2.18 in the formal evaluation.
منابع مشابه
Approaches for Source Retrieval and Text Alignment of Plagiarism Detection Notebook for PAN at CLEF 2013
In this paper, we describe our approach at the PAN@CLEF2013 plagiarism detection competition. In sub-task of Source Retrieval, a method combined TF-IDF, PatTree and Weighted TF-IDF to extract the keywords of suspicious documents as queries to retrieve the plagiarism source document is proposed. In sub-task of Text Alignment, a method based on sentence similarity is presented. Our text alignment...
متن کاملRule Based Plagiarism Detection using Information Retrieval - Notebook for PAN at CLEF 2011
This paper reports about the development of a Plagiarism detection system as a part of the Plagiarism detection task in PAN 2011. The external plagiarism detection problem has been solved with the help of Nutch, an open source Information Retrieval (IR) system. The system contains three phases – knowledge preparation, candidate retrieval and plagiarism detection. From the source documents, know...
متن کاملExternal & Intrinsic Plagiarism Detection: VSM & Discourse Markers based Approach - Notebook for PAN at CLEF 2011
This paper aims to explain the performance of plagiarism detection system which can detect External as well as Intrinsic Plagiarism in text. It reports the results on PAN-PC-2011 test corpus. We investigated Vector Space Model based techniques for detecting external plagiarism cases and discourse markers based features to detect intrinsic plagiarism cases.
متن کاملApproaches for Intrinsic and External Plagiarism Detection - Notebook for PAN at CLEF 2011
Plagiarism detection has been considered as a classification problem which can be approximated with intrinsic strategies, considering self-based information from a given document, and external strategies, considering comparison techniques between a suspicious document and different sources. In this work, both intrinsic and external approaches for plagiarism detection are presented. First, the m...
متن کاملExternal and Intrinsic Plagiarism Detection Using a Cross-Lingual Retrieval and Segmentation System - Lab Report for PAN at CLEF 2010
We present our hybrid system for the PAN challenge at CLEF 2010. Our system performs plagiarism detection for translated and non-translated externally as well as intrinsically plagiarized document passages. Our external plagiarism detection approach is formulated as an information retrieval problem, using heuristic post processing to arrive at the final detection results. For the retrieval step...
متن کامل